In this practical, you will make a GitHub repository, edit a markdown text file, and track changes in this document using Git. You can do this using a desktop interface for Git, and/or the command line; instructions for both are given.
A repository is essentially the same as a directory: a folder, containing your files for one project. It is not limited to code; it can also store images, text files, etc. The idea is that, a bit like Dropbox, there will be a copy of this folder on the GitHub website, which can be copied to multiple different computers. The big plus is that Git was designed for version control, keeping “snapshots” of each point in your repository’s history without you having to have 20 different copies of each file. This allows you to look back or return to previous versions.
As a member of the jknightlab GitHub group, you can have repositories under your account that only you can see, as well as repositories associated with the lab account that everyone can access. For example, I have personal repositories for my thesis and for odds and ends that don’t go anywhere else, and in the jknightlab group I use the “GAiNS” and “CardiacSurgery” repositories. All of these are synced to my work computer and my laptop, and other people on the GAinS project can also have copies. These can also be public or private repositories, controlling whether people outside the group can see them.
This is the other reason to use Git: it’s a bit like a shared electronic lab book. I have a record of everything computer-based that I’ve tracked with Git that I can look back to, other people (current and future) on the project can look at this, and if anyone in the group wants to use similar approaches on different datasets, they can access my notes easily. This is not limited to code; there should be at least descriptions of different datasets, how they were generated, and where the data is stored. If you look quickly at the GAinS repository, you can see 5 different people have contributed to it, documenting the different aspects of the project. This might look like a lot of work, but it wasn’t all done at once. Using Git probably adds about 5 minutes’ work to my day, and in the long run has saved a huge amount of time and effort.
Open a web browser and go to the GitHub website
Sign in and you should see something like this:
Click on the green “New repository” button under “Your repositories”
Fill in the details: if you want you can make a repository that you will continue to use, or just call it “Git_Practice” and delete it at the end. Select your username as the owner (unless you are making a repository for a lab project not already included in the lab account).
Tick “initialize this repository with a README” and click “Create repository”
Click on “README.md” in the file list. This will open the README file (the page will look pretty similar as it is the only file in your repository at the moment)
To edit the file, click the pencil symbol at the top right.
# GitHub_and_Rmd_Introduction
This is my practice markdown document.
Use a blank line to separate paragraphs
## A second level header
A list:
- item one
- item two
and a numbered list
1. you
2. get
3. the point
### A third level header
**Some bold text** and _some italic text_
And a [link](http://kbroman.org/knitr_knutshell/pages/markdown.html)
Commit your changes. This is a fancy (quicker) way of saying “save the file, but remember what has been changed from the previous version”. Give the commit a short, descriptive name so that if you ever want to find or revert to a previous version of your file, you can identify it easily. You can also add a longer, optional description to keep track of what you are doing.
If you navigate back to the repository home by clicking on its name at the top of the page, you will see that your commit is now logged and the README displayed has been updated. First, click on the clock with “2 commits” next to it underneath the repository description. This shows you the history of your changes. Click on the name of the latest commit, and you will see the old and new versions of your file side by side (diff view - you can also have a unified view, which is more like Word track changes).
The GitHub website has a lot of great features, but really what you want is to edit your files as normal on your computer, and then update the repository on GitHub with your changes. To do this, you first need a local copy, or clone, of your repo.
GitHub Desktop is a GUI (graphical user interface) for the Git software, which was originally just used through the command line. You can now do the same things in this programme, with the benefit of it being easier to see what is happening. When you download GitHub Desktop, it installs Git on your computer, so you can also use it without the GUI. Another way to use Git (which I have very limited experience with) is called GitKraken - you might find you prefer this. The screenshots are from the Windows version, but I’ve noted where the Mac version looks different.
(If you want to try both the GUI and command line approaches, you can delete and reclone your repo, and make another small change to your document to commit.)
Open GitHub Desktop
If you haven’t used this before, you will have to configure Git i.e. tell it your login details so it can sync with GitHub. Click on the settings button on the top right and select Options. I have mine set up as below.
3. Click on the plus symbol on the top left of the window. First, we we clone the repository we have been using so far from GitHub. Click the Clone tab, find your repository, and clone it. Choose where you want to copy it to on your computer, and it will be copied there. Navigate to it in Windows Explorer/Finder and open the folder, and you will find the README file you made earlier.
Open Git Bash/Terminal/sign into Galahad
If you haven’t used this before, you will have to configure Git i.e. tell it your login details so it can sync with GitHub. To do this, use
git config --global user.name "Your Name Here"
git config --global user.email "your_email@well.ox.ac.uk"
cd /MyPath)ls to see the README.md file that you made.git clone https://github.com/jknightlab/GitHub_and_Rmd_Introduction.git
git log. To look in detail at a particular commit, use git show [commit]If you carry on using the command line, you can set it up so that you don’t have to enter your user name and password every time.
Open your README.md file in RStudio (the second half of this tutorial will go into R Markdown in more detail). The markdown should be displayed in the top left panel, which is your text editor.
Make a small change to your README file, e.g.
An **extra** _line_ in the file with `a code block`
You will also see that a file called README.html has appeared. RStudio created this when you previewed your markdown file, and this is really the output of a markdown document. If you look in the folder, you will find this file, and you can open it and look at it in your web browser.
As on the website, fill in a commit summary and description at the bottom of the window. It is probably better practice to do the edit and the ignore as two separate commits as they are unrelated changes. I try to have each commit I do a distinct task, which could involve editing multiple different files but all with the same goal in mind.
Click “Commit to master”. The changes will vanish from the “Changes” tab, and the commit is added to the “History” tab.
We have now changed our minds about the change we made and want to revert to the former version. It is not possible to undo a commit on GitHub (well, you can just edit the file again of course) but on your computer you can return to a previous version of your files. To do this, click on the commit with the minor edit of your README file, and click “Revert” on the top right. You will see that this creates another commit, so everything is tracked - you can go back and forth as much as you want.
Use git status to see what files have been changed in your local repository. Use git diff to see the changes.
The equivalent of ticking the files you want to add to a commit is git add [file name]. To ignore a file, create a file called “.gitignore” and add the file name to it (e.g. use vim)
vim .gitignore
Shift + i
*.html
ESC
:wq
To make a commit, use git commit -m [descriptive message]. All the changes that you have **staged** usingadd` will be in this commit.
You have now committed these changes locally. Use git status again to check this.
To undo this commit, use git reset [commit]. This reverts all commits after the commit named in the command.
Finally, we want to upload our local changes to GitHub. This is called pushing. With GitHub Desktop, this is done by pressing the “Sync” button. Your local repository is now synchronised with your online one. Navigate there to check this. If you or someone else then wants to work on a copy of these files on another computer, they can clone the repository as we did earlier.
With the command line, use git push. You might get warnings, because you have not specified branches (we only have one, but to be more precise you can use git push origin master. Origin is the online version, master is your local branch). To download changes from GitHub, use git pull.
Pull, as you might imagine, does download changes to your repository from GitHub. HOWEVER, this can occasionally cause problems, as this process also involves merging those changes with any changes you have made locally. Behind the scenes, pull is actually a combination of two separate tasks: fetch (downloading changes from GitHub to a local branch) and merge (merging them to your local master branch).
GitHub Desktop only has a “Sync” button, which simultaneously pushes and pulls changes. In some ways this is simpler, as with what we have done so far this will work just fine. It is also fine most of the time when you are the only person working on a file (which is mostly how I use GitHub - I have shared repositories, but we generally don’t edit the same file). The only time I’ve had issues with this is when I’ve edited a file on my laptop, and uploaded the changes to GitHub. I’ve then edited the same file on my desktop without first pulling these changes. When I try to upload those changes, Git understandably doesn’t know how to combine them, and doesn’t want to mess up the online version, as that is the central copy. It will then tell me that I have a merge conflict. It looks scary when this happens, but it is very easy to deal with. Git puts both sets of changes into my local file, surrounded by tags telling me which sections are confusing. I can then edit this file so that it has the changes I want, and once I’ve deleted the merge conflict tags, I can commit the changes as normal. Obviously, I can avoid this by remembering to pull down changes before I start working on the file…Git is very good at coping with multiple changes to the same file but in different sections, as well.
The same thing can happen when using the command line, but you have a bit more control (though it looks more complicated). For example, if you try to use git push and get an error message because your local copy has diverged from the central repository, you can use git pull --rebase origin master. This means “pull the upstream changes first, then add all your local commits”. If there is a direct conflict, you will be told and can edit the file as above.
If you want to practice dealing with this:
open your local README file and delete a word.
Then go to the same file via GitHub, and delete a different word in the same line.
Commit both changes,
and then press Sync in GitHub Desktop.
You will get an error message, so click OK and go back to your file in RStudio to sort this out.
You will see something like this:
<<<<<<< HEAD
This is my practice document.
=======
This is my markdown document.
>>>>>>> origin/master
This is my unconflicted practice markdown document.
Importantly for Git use in general, though perhaps not for today, is that branches can be used to have different sets of changes in parallel. You can have different people working on different aspects of the project at the same time. When they are done, they download the current version of the repository and merge their changes to that. You can even use branches just for yourself - the idea is that in software development, each feature should have its own branch, so that the master branch never has broken or half-written code. For the majority of us, we won’t be using Git in that complicated a way, so we can essentially ignore branches. Once people get used to using Git, we can talk more about this aspect of it if you would like.
You have now made a Git repository, cloned it to your computer, made changes both locally and online, committed these changes, and synchronised them. That’s everything you need to get going with Git - as I’ve mentioned, branching can be useful, but makes things unecessarily complicated for now.
I use Git all the time - I generally open GitHub Desktop at the start of the day and hit Sync, then at lunch and before I leave, I commit any changes to my files that I want to keep (if I’m particularly conscientious, I commit changes through the day, and if I switch to working on a different project) and hit Sync again. It takes next to no time once a repository is set up, and it is GREAT for keeping track of things over time and across computers. It is also brilliant for sharing scripts and files within the group; however, this does require a certain amount of documentation to be done. For example, with R Markdown
Links:
A webinar showing GitHub Desktop
This tutorial uses the command line, and shows you how to turn an existing local directory into a Git repository: GitHub for Beginners
A couple of free online tutorials: Code School and Code Academy